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Abstract. We design a family of program analyses for JavaScript that 
make no approximation in matching calls with returns, exceptions with 
handlers, and breaks with labels. We do so by starting from an estab- 
lished reduction semantics for JavaScript and systematically deriving its 
intensional abstract interpretation. Our first step is to transform the se- 
mantics into an equivalent low-level abstract machine: the JavaScript 
Abstract Machine (JAM). We then give an infinite-state yet decidable 
pushdown machine whose stack precisely models the structure of the con- 
crete program stack. The precise model of stack structure in turn confers 
precise control-fiow analysis even in the presence of control effects, such 
as exceptions and finally blocks. We give pushdown generalizations of 
traditional forms of analysis such as fe-CFA, and prove the pushdown 
framework for abstract interpretation is sound and computable. 



1 Introduction 

JavaScript is the dominant language of the web, making it the most ubiquitous 
programming language in use today. Beyond the browser, it is increasingly im- 
portant as a general-purpose language, as a server-side scripting language, and 
as an embedded scripting language — notably, Java 6 includes support for script- 
ing applications via the javax. script package, and the JDK ships with the 
Mozilla Rhino JavaScript engine. Due to its ubiquity, JavaScript has become the 
target language for an array of compilers for languages such as C#, Java, Ruby, 
and others, making JavaScript a widely used "assembly language." As JavaScript 
cements its foundational role, the importance of robust static reasoning tools for 
that foundation grows. 

Motivated by the desire to handle non-local control effects such as exceptions 
and finally precisely, we will depart from standard practice in higher-order 
program analysis to derive an infinite-state yet decidable pushdown abstraction 
from our original abstract machine. The stack of the pushdown abstract inter- 
preter exactly models the stack of the original abstract machine with no loss of 
structure — approximation is inflicted on only the control states. This pushdown 
framework offers a degree of precision in reasoning about control inaccessible to 
previous analyzers. 

Pushdown analysis is an alternative paradigm for the analysis of higher- 
order programs in which the run-time program stack is precisely modeled with 
the stack of a pushdown system [40114] . Consequently, a pushdown analysis can 



exactly match control flow transfers from calls to returns, from throws to han- 
dlers, and from breaks to labels. This in contrast with the traditional approaches 
of finite-state abstractions which necessarily model the control stack with finite 
bounds. 

As an example demonstrating the basic difference between traditional ap- 
proaches, such as OCFA, and our pushdown approach, consider the following 
JavaScript program: 

// (R-^R) ^ (R^R) 

// Compute an approximate derivative of f . 
function deriv(f) -[ 

var e = 0.0001; 

return function (x) { 

return (f (x+e) - f(x-e)) / (2*e) ; 

>; 

>; 

derivCf unction (y) { return y*y; }) ; 

The deriv program computes an approximation to the derivative of its argu- 
ment. In this example, it is being applied the square function, so it returns an 
approximation to the double function. 

It is important to take note of the two distinct calls to f . Basic program 
analyses, such as OCFA, will determine that the square function is the target of 
the call at f (x+e). However, they cannot determine whether the call to f (x+e) 
should return to f (x+e) or to f (x-e) . Context-sensitive analysis, such as ICFA, 
can reason more precisely by distinguishing the analysis of each call to f , how- 
ever such methods come with a prohibitive computational cost |38| and, more 
fundamentally, fc-CFA will only suffice for precisely reasoning about the control 
stack up to a fixed calling context depth. 

This is the fundamental shortcoming of traditional approaches to higher- 
order program analysis, both in functional and object-oriented languages. This 
is an unfortunate situation, since the dominant control mechanism is calls and 
returns. To make matters worse, in addition to higher-order functions, JavaScript 
includes sophisticated control mechanisms further complicating and confounding 
analytic approaches. 

To overcome this shortcoming we use a pushdown approach to abstraction 
that exactly captures the behavior of the control stack. We derive the pushdown 
analysis as an abstract interpretation of an abstract machine for JavaScript. 
The crucial difference between our approach and previous approaches is that we 
will leave the stack unabstracted. As this abstract interpretation ranges over an 
infinite state-space, the main technical difficulty will be recovering decidability 
of reachable states. 

Challenges from JavaScript 

JavaScript is an expressive, aggressively dynamic, high-level programming lan- 
guage. It is a higher-order, imperative, untyped language that is both functional 



and object-oriented, with prototype-based inheritance, constructors, non-local 
control, and a number of semantic quirks. Most quirks simply demand attention 
to detail, e.g.: 

if (false) { var x ; } 
... X ... // X is defined 

Other quirks, such as the much-maligned with construct end up succumbing 
to an unremarkable desugaring. Yet other features, like non-local control effects 
and prototypical inheritance, require attention in the mechanics of the analysis 
itself; for a hint of what is possible, consider: 

out: while (true) 
try ■[ 

break out ; 
} finally { 
try { 

return 10 ; 
} finally i 

console . logC'this runs; 10 returns") ; 

> 

> 

It has become customary when reasoning about JavaScript to assume well- 
behavedness — that some subset of its features are never (or rarely) used for 
many programs. Richards, Lebresne, Burg and Vitek's thorough study [34J has 
cast empirical doubt on these well-behavedness assumptions, finding almost ev- 
ery language feature used in almost every program in a large corpus of widely 
deployed JavaScript code. 

Our goal is a principled approach for reasoning about all of JavaScript, in- 
cluding its unusual semantic peculiarities and its complex control mechanisms. 
To make this possible, the first step is the calculation of an abstractable abstract 
machine from an established semantics for JavaScript. From there, a pushdown 
abstract interpretation of that machine yields a sound, robust framework for 
static analysis of JavaScript with precise reasoning about the control stack. 

Contributions 

The primary contribution of this work is a provably sound and computable 
framework for infinite-state pushdown abstract interpretations of all of JavaScript, 
sans eval, that exactly models the program stack including complex local and 
non-local control-flow relationships, such as proper matching between calls and 
returns, throws and handlers, and breaks and labels|f| 

^ One might wonder why break to a label requires non-local reasoning. In fact, it 
should not require it, but Guha et al.'s desugaring into A,/s, handles break using 
powerful escape continuations. Constraints in the desugaring process prevent these 



In support of our primary contribution, our secondary contributions include 
the development of a variant of a known formal model for JavaScript as a cal- 
culus of explicit substitutions; a correct abstract machine for this model ob- 
tained via a detailed derivation, carried out in SML, going from the calculus to 
the machine via small, meaning-preserving program transformations; and exe- 
cutable semantic models for the reduction semantics, the abstract machine and 
its pushdown-abstractions, written in PLT Redex [15j . 

Outline 

Section [2] gives the syntax and semantics of a core calculus of explicit substi- 
tutions based on the Aj^-calculus. This new calculus, Xpjs, is shown to corre- 
spond with Xjs- Section 13] derives an abstract machine, the JavaScript Abstract 
Machine (JAM), from the calculus of explicit substitutions, which is a correct 
machine for evaluating Xjs programs. The machine has been crafted in such a 
way that it is suitable for approximation by a pushdown automaton. Section |4] 
yields a family of pushdown abstract interpreters by a simple store-abstraction 
of the JAM. The framework is proved to be sound and computable. Specific 
program analyses are obtained by instantiating the allocation strategy for the 
machine and examples of strategies corresponding to pushdown generalization of 
known analyses are given. Section [S] relates this work to the research literature 
and section ini concludes. 

Background and notation: We assume a basic familiarity with reduction seman- 
tics and abstract machines. For background on concepts, terminology, and nota- 
tion employed in this paper, we refer the reader to Semantics Engineering with 
PLT Redex |15j . Our construction of machines from reduction semantics follows 
Danvy, et aZ.'s refocusing-based approach [13 4 12J. Finally, for background on 
systematic abstract interpretation of abstract machines, see our recent work on 
the approach [39j. 

2 A calculus of explicit substitutions: Xpjs 

Our semantics-based approach to analysis is founded on abstract machines, 
which give an idealized characterization of a low-level language implementa- 
tion. As such, we need a correct abstract machine for JavaScript. Rather than 
design one from scratch and manually verify its correctness after the fact, we 
rely on the syntactic correspondence between calculi and machines and adopt 
the refocusing-based approach of Danvy, et al., to construct an abstract machine 
systematically from an established semantics for JavaScript. 

Guha, Saftoiu, and Krishnamurthi [17J give a small core calculus, Ajs, with a 
small-step reduction semantics using evaluation contexts and demonstrate that 

escape continuations from crossing interprocedural boundaries, but unrestricted — 
or optimized Xjs — may violate these constraints. For completeness, we liandle full, 
unrestricted Xjs, which means we must model these general escape continuations. 



full JavaScript can be desugared into Xjs- The semantics accounts for all of 
JavaScript's features with the exception of eval. Only some of JavaScript quirks 
are modeled directly, while other aspects are treated traditionally. For example, 
lexical scope is modeled with substitution. The desugarer is modeled formally 
and also available as a standalone Haskell program. 

We choose to adopt the Xjs model since its small size results in a tractably 
sized abstract machine. 

The remainder of this paper focuses on machines and abstract interpretation 
for Xjs- We refer the reader to Guha, et al, for details on desugaring JavaScript 
to Xjs and rational for the design decisions made. 

2.1 Syntax 

The syntax of Xpjs is given in figure[T] Syntactic constants include strings, num- 
bers, addresses, booleans, the undefined value, and the null value. Addresses are 
first-class values used to model mutable references. Heap allocation and derefer- 
ence is made explicit through desugaring to Xjs- Syntactic values include con- 
stants, function terms, and records. Records are keyed by strings and operations 
on records are modeled by functional update, extension, and deletion. Expres- 
sions include variables, syntactic values, and syntax for let binding, function ap- 
plication, record dereference, record update, record deletion, assignment, alloca- 
tion, dereference, conditionals, sequencing, while loops, labels, breaks, exception 
handlers, finalizers, exception raising, and application of primitive operations. A 
program is a closed expression. 
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Fig. 1: Syntax of Xjs 



2.2 Semantics 



Guha, et at, give a substitution-based reduction semantics formulated in terms 
of Felleisen-Hieb-style evaluation contexts The use of substitution in Xjs is 
traditional from a theoretical point of view, and is motivated in part by want of 
conventional reasoning techniques such as subject reduction. On the other hand, 
environments are traditional from an implementation point of view. To mediate 
the gap, we first develop a variant of Xjs that models the meta-theoretic notion 
of substitution with explicit substitutions. 

Substitutions are represented explicitly with environments, which are finite 
maps from variables to values. 

Substitution [ii/xje, which is a meta-theoretic notation denoting e with all 
free-occurrences of x replaced by v, is represented at the syntactic level in Xpjs-, 
a calculus of explicit substitutions, as a pair consisting of e and an environment 
representing the substitution: (e, {{x,v)}). Such a pair is known as a closure. 

The heap is modeled as a top-level store, a finite map from addresses to 
values. 

The complete syntax of values and closures in Xpjs is given in figure [TJ The 
semantics of Xpjs is given in terms of a small-step reduction relation defined in 
figured] There are four classes of reductions: 

1. context-insensitive, store- insensitive reductions operating over closures to 
implement computations that have no effect on the context or store, 

2. context-sensitive or store-sensitive reductions operating over pairs of stores 
and programs to implement memory- and control-effects, 

3. (omitted) reductions propagating environments from closures to inner ex- 
pressions, and 

4. (omitted) reductions raising exceptions that represent run-time errors such 
as applying a non-function, branching on a non-boolearQ, indexing into a 
non-record or with a non-string key, etc. As a result, Xpjs programs do not 
get stuck: either they diverge or result in a value or an uncaught exception. 

Reduction proceeds by a program being decomposed into a redex and eval- 
uation context, which represents a portion of program text with a single hole, 
written The grammar of evaluation contexts, defined in figure [3l specifies 
where in a program reduction may occur. The notation [c]" denotes both the 
decomposition of a program into the evaluation context £ with c in the hole and 
the plugging of c into £, which replaces the single hole in £ by c. In addition to 
closures, holes may also be replaced by contexts, which yields another context. 
This is indicated with the notation "f [f ]". 

There are three classes of evaluation contexts in figure [3] local contexts C 
range over all contexts that do not include exception handlers, finalizers, or 
labels; control contexts T> range over contexts that are either empty or have a 

* One of JavaScript's quirks are its broad definitions of which values act as true and 
false, a quirk which doesn't appear to be modeled here at first glance. The desugaring 
transformation eliminates this quirk by coercing the condition in an if expression. 



Context-insensitive, store-insensitive rules: 
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Context-sensitive, store-sensitive rules: 
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Fig. 2: Reduction semantics for Xpjs 



C ::= [] I let (x = C) c \ C(c) \ v(.v,C,c) \ {sTv, s :C, sTc} | op(Jv,C,c) 
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Fig. 3: Evaluation contexts for Xpjs 



outermost exception handler, finalizer, or label; and general contexts £ range 
over all evaluation contexts. 

The distinction is made to describe the behavior of Apj^'s control constructs: 
breaks, finalizers, and exceptions. When an exception is thrown, the enclosing 
local context is discarded and if the nearest enclosing control context is an ex- 
ception handler, the thrown value is given to the handlerlf] 

f [try {C[throw?;]} catch (x){(e,p)}] I — > E[{e, p[x ^ v])]. 

If the nearest enclosing control context is a finalizer, the finalization clause is 
evaluated and the exception is rethrown: 

f [try {C [throw u]} finally {c}] I > f[c;throww]. 

If the nearest enclosing control context is a label, the context up to and including 
the label are discarded and the thrown value continues outward toward the next 
control context: 

£■[£:{ C[throwt;] }] I — > f [throw w]. 

Finally, if there is no enclosing control context, the exception was not handled 
and the program has resulted in an error: 

C[throwu] I > err u. 

Breaks are handled in a similar way, except local contexts are discarded until 
the matching label are found. In the case of finalizers, the finalization clause is 
run, followed by reinvoking the break. 

The result of a computation is an answer, which consists of a store and either 
a value or error, indicating an uncaught exception: 

A ::= (cr, v) \ {a, err v) 

The evaluation of a program is defined by a partial function relating programs 
to answers: 

eval{e) = ^ if inj jg{e) I » A, for some A, 

where I » denotes the refiexive, transitive closure of the reduction relation 

defined in figure [2] and 

- (0,(e,0)). 

Having established the syntax and semantics of the Apjg-calculus, we now 
relate it to Guha et al.^s Aj^-calculus. 



® We omit the store from these examples since they have no effect upon it. 



2.3 Correspondence with \js 

We have developed an explicit substitution variant of Xjs in order to derive 
an environment-based abstract machine, which as we will see, is important for 
the subsequent abstract interpretation. However, let us briefly discuss this new 
calculus's relation to Xjs and establish their correspondence so that we can rest 
assured that our analytic framework is really reasoning about Xjs programs. 

Our presentation of evaluation contexts for Xpjs closely follows Guha, et al. 
There are two important differences. 

1. The grammar of evaluation contexts for Xjs makes a distinction between 
local contexts including labels, and local contexts including exception han- 
dlers. Let F and Q denote such contexts, respectively: 

F::=C I C[£:{ J"}] 

g ::= C I C[try igy catch (a;){c}]. 

The distinction allows for exceptions to effectively jump over enclosing labels 
and for breaks to jump over handlers in one step of reduction: 

£ [try {J"[throw w]} catch (a;){(e,p)}] I — > E[{e, p[x v])], 

and 

£[l' ■.ig[breaV.lv\y\ I — > £[v],iie^t 

I — > £ [break ^ w], otherwise. 

It should be clear that our notion of reduction can simulate the above one- 
step reductions in one or more steps corresponding to the number of labels 
(exception handlers) in F (in g). We adopt our single notion of label and 
handler free local contexts in order to simplify the abstract machine in the 
subsequent section. 

2. The grammar of evaluation of contexts for Xjs mistakenly does not include 
break contexts in the set of local contexts, causing break expressions within 
break expression to get stuck, falsifying the soundness theorem. The mis- 
take is minor and easily fixed. When relating Xpjs to Xjs we assume this 
correction has been made. 

We write Xjs and Xpjs over a reduction relation to denote the (omitted) one- 
step reduction relation as given by Guha, et al., corrected as described above, 
and the one-step reduction as defined in figure |21 respectively. 

The results of Xjs and Xpjs evaluation are related by a function U that recur- 
sively forces the all of the delayed substitutions represented by an environment 
(H §2.5], thus mapping a value to a syntactic value. It is the identity function 
on syntactic values; for answers, functions, and records it is defined as: 



U{{a, err v)) 

W(fun(x) { e }, {(a;o,wo),...,(2;„,w,i)}) 

W({sTlJ>, {(xo, Uo), ■ • ■ , {Xn.Vn)]) 



— {a, err U{v)) 

= funCa;) { [U{vq)/xo, . . . ,U{vn)/x„]e } 
= is : [U{vo)/xo, ■ ■ ■ ,U{Vn)/Xn]v}. 



We can now formally state the calculi's correspondence: 
Lemma 1 (Correspondence). For all programs e, 

where A^U{A'). 

Proof. (Sketch.) The proof follows the structure of Biernacka and Danvy's ^ 
proof of correspondence for the A-calculus and Curien's Ap-calculus of explicit 
substitutions [llj, straightforwardly extended to \js and Xpjs- 

We have now established our semantic basis: a calculus of explicit substi- 
tutions corresponding to Aj5, which is a model adequate for all of JavaScript 
minus eval. In the following section, we apply the syntactic correspondence to 
derive a correct-by-construction environment-based abstract machine. 

3 The JavaScript Abstract Machine (JAM) 

In the section, we present the JAM: the JavaScript Abstract Machine. The ab- 
stract machine takes the form of a first-order state transition system that oper- 
ates over triples consisting of a store, a closure, and a control stack, represented 
by a list of evaluation contexts. There are three classes of transitions for the 
machine: those that evaluate, those that continue, and those that apply. 

evaluate: Evaluation transitions operate over triples dispatching on the closure 
component. The eval transitions implement a search for a redex or a value. 
If the closure is a value, then the search for a value is complete; the machine 
transitions to a continue state to plug the value into its context. Alterna- 
tively, if the closure is a redex, then the search is also complete and the 
machine transitions to an apply state to contract the redex. Finally, if the 
closure is a neither a redex nor a value, the search continues; the machine 
selects the next closure to search and pushes a single evaluation context on 
to the control stack. 

continue: Continuation transitions operate over triples where the closure com- 
ponent is always a value, dispatching on the top evaluation context. The 
value is being plugged into the context represented by the stack. If plugging 
the value into the context results in a redex, the machine transitions to an 
apply state. If plugging the value reveals the next closure that needs to be 
evaluated, the machine transitions to an evaluate state. If plugging the value 
in turn results in another value, the machine transitions to a continue state 
to plug that value. Finally, if both the control stack is empty, the result of 
the program has been reached and the machine halts with the answer. 

apply: Application transitions operate over triples where the closure component 
is always a redex. These transitions dispatch (mostly) on the redex and serve 
to contract it, thus implementing the reduction relation of figure [2j Since 
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Fig. 4: Evaluation transitions 



reductions are potentially store- and context-sensitive, the transitions may 
also dispatch on the control continuation in order to implement the control 
operators. 

The machine relies on three functions for interacting with the store: 

alloc : State — ?■ Address^ 
put : Store x Address x Value Store 
get : Store x Address — >■ 'P{ Value) 

The alloc function makes explicit the non-deterministic choice of addresses when 
allocating space in the store. It returns a vector of addresses, often a singleton, 
based on the current state of the machine. For the moment, all that we require 
of alloc is that it return a suitable number of addresses and that none are in use 
in the store. The put function updates a store location and is defined as: 

put{a,a,v) = a\a i->- v], 

and the get function retrieves a value from a store location as a singleton set: 

get{a,a) = {a{a)}. 



We make explicit the use of these three functions because they will form the 
essential mechanism of abstracting the machine in the subsequent section; the 



a, V, nil) CO 

a, V, let (a; = •) c :: E)co 

(7,v,»0 :: E)co 

a, D, . . .) :: E)co 

a,v,t(.u, ...,•) :: E)co 

(T,v,t(.u, ...,», c, .. .) :: E)co 

a,v,isi:u,...,s„:»y :: E)co 

(7,v,{.Si:u, . . . , Si:;Si+i:c, . . .y :: E)^ 

cr, :: E)co 

a, V, u [•] :: E) co 

a, V, • [c] = d :: E)co 

a,v,uL»l = c :: -E)co 

CT, V, mM = • :: E)co 

a, V, del • [c] :: £;)co 

cr, V, del :: i;)co 

a,v,ref» :: E)co 

a, V, deref • :: E)co 

a,v,» = c:: E)co 

a,v,u = • :: E) CO 

a, u, if (.){cHd> :: E)co 

a,v,»;c :: E)co 

a, V, throw • :: E)co 

a, V, break £ • :: E)co 

cr, V, opiu, . . . , •) :: iJ)co 

CT, v, op(u, ...,«, c, .. .) :: E)co 



v,a) 

cr, let (a; = v) c,E)ap 

a,v(.),E)ap 

a, c, . . .) :: E) ev 

a,t{u,. . . , v),E)ap 

a,c,t(u, . . . ,v,», . . .) :: E)ev 

a,isi:u, . . . , Sn '-v}, E)co 

cr, c, -Csi :m, . . . , Si-.v, Sj+i :•, ...}:: £)e 

cr, c, «[•] :: E) ev 

cr, It [«],£) ap 
cr, C, W [•] = d :: i?)e^ 
cr, c, w[v] = • :: E) ev 
cr, wK] =V,E)ap 
cr,c,del «[•] :: E) ev 
cr, del uLvl,E)ap 
a, refv,E)ap 
a, deref v,E)ap 
a,c,v = • :: E)ev 
a,u = V, E)ap 

cr, \Kv)icyidy,E)ap 

ff,C, E)ev 

cr, throw 

a, break I w, S) 

cr, Op(u, . . . ,v),E)ap 

a, c, opiu, ... ,v,; . . .) :: i5)e,; 



Fig. 5: Continuation transitions 



slightly strange definition for get is to facilitate approximation where there may 
be multiple values residing at a store location. 

The initial machine configuration is an evaluation state consisting of the 
empty store, the program, the empty environment, and the empty control and 
local continuation: 

injjAMie) = (0, (e,0),ml)e„. 
Final configurations are answers, just as in the reduction semantics. 

3.1 Reformulation of reduction semantics 

Unfortunately, there is an immediate problem with the described approach when 
applied to the JavaScript abstract machine. The problem stems from the JAM 
having two control stacks. Consequently, when abstracting we arrive at a two- 
stack pushdown machine, which in general has the power to simulate a Turing- 
machine. However this problem can be overcome: the JAM can be reformulated 
into a single stack machine in such a way that preserves correctness and enables 
a pushdown abstraction that is decidable. 



{a, {x,p),E) 



let (x = v) c, E)ap 

(fun(^) { e },p)(.v),E)ap 

is : V, Si-.v, s : v'} lSil,E)ap 

{.'sTvy[sxl,E)ap 

{JTv, Si:Vi,'sTv'y[sil =v,E)ap 

del {JTv, Si -.VijJTv'yisi^ , E) ap 

del isTvyLs^l,E) 

ap 

\f (true) icy idy,E)ap 
if(false){c}{d},£;)ap 

OPn (Ul, . . . , Vn),E)^o 

ref v,E)ap 

deref a, E)ap 
a = v,E)ap 
throw V, nil) 

throw try ■[•}■ catch (a;){(e,p)} 



E)a 



throw ?;,try ■[•}■ finally icy :: E)ap 
throw ?;,£:{•}:: E)ap 
throw w, C ;: E)ap 

break f Ujtry {a;} catch (•){c} :: i5)ap 
break I v,try {•} finally {c> :: £)ap 
break ^ v,l:i • } :: _B)ap 
break ^ • } :: E)ap 

break ^ u,C :: £;>„p 



o", V, E) CO if f G gst{a, a) 
put{a, a, v), (c, p[x n> a]),E)ev 
where a — alloc{<i) 

tt{a, a, v), (e, p[x a]), _E)™ 
if = |?;|, where a = alloc{';) 
a,v, E)co 

a, undef , E) co ii s^ ^s 
a, is:v, Si'.v, s:v'y, E)co 
a, is:v, s:v'y, E) co 

Cr, {TTu}, E)co if Sa; ^ s 
Cr, C, _E)ci; 
cr, d, £')et, 

(7,u,£;>co if (5(op„,-i;i, . . . = 
put{a, a, v),a, E)co 
where a — alloc(<;) 
a, V, E) CO if u G get{a, a) 
put{a,a,v),v,E) CO 
err v, a) 

put{a, a, v), (e, p[x i->- a]),E)ev 
where a — alloc{<;) 
a, c; throw v, E)cv 
cr, throw v,E)ap 
cr, throw v,E)ap 
cr, break £ v, E)ev 
cr, c;break £ v, E)ev 
a,v,E)co 
a,v,E)co if ^ 
cr, breaks 7;, S) 

ap 



Fig. 6: Application transitions 



One of the lessons of our abstract machine-based approach to analysis is 
that many problems in program analysis can be solved at the semantic level 
and then imported systematically to the analytic side. So for example, abstract 
garbage collection P7] can be expressed as concrete garbage collection with the 
pointer refinement and store abstraction applied |39| . Similarly, the exponential 
complexity of fc-CFA can be avoided by concretely changing the representation 
of closures and then abstracting in an unremarkable way |29| . 

We likewise solve our two-stack problem by a reformulation at the level of 
the reduction semantics for Xpjs and then repeat the refocusing construction to 
derive a one-stack variant of the JAM. 

The basic reason for maintaining the control and local stack is to allow jumps 
over the local context whenever a control operator is invoked. This is seen in the 



(cr, throw v) — 


> {a, err v) 


(cr,£:[<S[throwt;]]) - 


5- (cr, £^[throw v]) 


(a, £[try {throw v} catch (a;){(e,p)}]) — 


> W,£[{e,p[x ^ v])]) 


{a, £[try {throw v} finally {c}]) — 


5- (cr, £^[c; throw v]) 


{a,£[l:{ thrown }]) - 


5- (cr, £^[throw n]) 


{cr,£:[5[break^i;]]) - 


> (cr, £^[break ^ 


(cr, £^ [try {break ^ ?;> catch (a;){c}]) — 


> (cr, £^[break ^ ii]) 


{a, £[try {break £ v} finally {c}]) - 


> (cr, £^[c; break ^ ii]) 


{a,£[£:ibreakevy]) - 


> (cr,f [«]) 


{a,£[l':{break£vy]) - 


> (cr, f [break ^i;]), if / / £ 



Fig. 7: Reformulated reduction semantics 



reduction semantics with reductions such as this: 

£[i':l C[break iv]}] I — > 8[v] ii I' = £ 

I — > £[breaktv\ if f 7^ £ 

To enable a single stack, we simulate this jump over the local context by "bub- 
bling" over it in a piecemeal fashion. This is accomplished by defining a notion 
of single, non-empty local context frames S, i.e., 

S ::= let {x = •) c \ •(x) \ c(.v, •, c) | • ■ • | break £ • \ opiv, •, c) 

then the reduction relation for control operators remains context sensitive, but 
does not operate over whole contexts, but just the enclosing frame, which can 
then be implemented with a stack. The rules for simulating the above reduction 
are then: 

£[S[break £ v]] I — > £:[break£v] 
£[£' : { break £v}] \ — > £[v] li I' ^ £ 

I — > £:[break£i;] \i £' ^ £ 

Clearly, these reductions simulate the original single reduction by a number of 
steps that corresponds to the number of local frames between the label and the 
break. 

The complete replacement of the context-sensitive reductions is given in fig- 
ure [71 We refer to this alternative reduction semantics as \p']g. 

Lemma 2. For all programs e, 

«"-J J5(e) 1-^^^ A <=^ injjsie) A. 

Guha et al. handle primitive operations in Xjs in standard fashion by dele- 
gating to a (5-function. For the sake of analysis, we can delegate to any sound, 
finite abstraction of the ^-function. The simplest such abstraction maps values 



to their types, which makes the abstract S function isomorphic to its intensional 
signature. For in-depth discussion of richer abstract domains over basic values for 
use in JavaScript, we refer the reader to Jensen et al. [2T|; they provide abstract 
domains for JavaScript which could be plugged directly into the JAM. 

3.2 Correctness of the JAM 

The JAM is a correct evaluator for Xpjs, and hence for Xjs as well. 
Lemma 3 (Correctness). For all programs e, 



Proof. (Sketch.) The correctness of the machine follows from the correctness of 
refocusing [13] and the (trivial) meaning preservation of subsequent transforma- 
tions. 

The detailed step-by-step transformation from the reduction semantics to the 
abstract machine has been carried out in the meta-language of SML. 

For the purposes of program analysis, we rely on the following definition of 
a program's reachable machine states, where ^ ranges over states: 



The set JAM{e) is potentially infinite and membership is clearly undecidable. 
In the next section, we devise a sound and computable approximation to the set 
JAM(e) by a family of pushdown automata. 

4 Pushdown abstractions of JavaScript 

To model non-local control precisely, the analysis must model the program stack 
precisely. Yet, the program stack can grow without bound — a substantial ob- 
stacle for the finite-state framework. Pushdown abstraction maps that program 
stack onto the unbounded stack of a pushdown system. Because the analysis 
inflicts a finite-state abstraction on the control states of the pushdown system, 
the analysis remains decidable. 

The idea is that by bounding the store, the control stack, while unbounded, 
will consist of a finite stack alphabet. Since the remaining components of the 
machine are finite, the abstract machine is equivalent in computational power 
to a pushdown automaton, and thus reachability questions are casts naturally 
in terms of decidable PDA reachability properties. 




JAM{e) = {<; | tnjj^uie) h 



JAM 



4.1 Bounding the store, not the stack 



The abstracted JAM provides a sound simulation of the JAM and, by lem- 
mas [T] and [H a sound simulation of Xpjs, and Xjs, as well. 

The machine's state-space is bounded simply by restricting the set of allocat- 
able addresses to a fixed set of finite size, Address. This necessitates a change in 
the machine transition system and the representation of states. The machine can 
no longer restrict allocated addresses to be fresh with respect to the domain of 
the store as is the case when bindings are allocated, ref-expression are evaluated, 
and continuations are pushed. Instead, the machine calls an allocation function 
that returns a member of the finite set of addresses. Since the allocation function 
may return an address already in use, the behavior of the store must change to 
accommodate multiple values residing in a given location. We let a range over 

such stores: 

(7 G Store = Address -^dn V {Value). 

We let JAM denote the abstract machine that results from replacing all occur- 
rences of the functions alloc, put, and get, with the following counterparts: 

alloc : State — > Address " 
put : Store x Address"' x Value"' ^ Store 
get : Store x Address V{ Value) 

The alloc function works like alloc, but produces addresses from the finite set 
Address. The put function updates a store location by joining the given value 
to any existing values that reside at that address: 

put(a, a, v) = (j[a M- {v} U o'(a)]. 

Joining rather than updating is critical for maintaining soundness. 

In essence, the finiteness of the address space implies collisions may occur 
in the store. By joining, we ensure these collisions are modelled safely. The get 
function returns the set of values at a store location, allowing a non-deterministic 
choice of values at that location. 

We can formally relate the JAM to its abstracted counterpart through the 
natural structural abstraction map a on their state-spaces. This map recurs over 
the state-space of the JAM to inflict a finitizing abstraction at the leaves its state- 
space — addresses and primitive values — and structures that cannot soundly ab- 
sorb that finitization, which in this case, is only the store. The range of the store 
expands into a power set, so that when an abstract address is re-allocated, it 
can hold both the existing values and the newly added value; formally: 



a{a) = Xa. | | a{a{aj). 

a{a)—d 



Theorem 1 (Soundness). If (; I > and a{<;) C <f, then there exists an 

abstract state q' , such that <; I '^^'^ ' <f' and a{';') C <f'. 

Proof. We reason by case analysis on the transition. In the cases where the tran- 
sition is deterministic, the result follows by calculation. For the the remaining 
non-deterministic cases, we must show an abstract state exists such that the 
simulation is preserved. By examining the rules for these cases, we see that all 
hinge on the abstract store in c soundly approximating the concrete store in <;, 
which follows from the assumption that a(c) E 

The more interesting aspect of the pushdown abstraction is decidability. No- 
tice that since the stack has a recursive, unbounded structure, the state-space of 
the machine is potentially infinite so deciding reachability by enumerating the 
reachable states will no longer suffice. 

Theorem 2 (Decidability). <j G JAM{e) is decidahle. 

Proof. States of the abstracted JAM consist of a store, a closure, and a list of 
single evaluation contexts representing the control stack. Observe that with the 
exception of the stack, each of these sets is finite: for a given program, there are 
a fixed set of expressions; environments are finite since they map variables to 
addresses and the address space is bounded; since expressions and enviroimients 
are finite, so too are the set of values; stores are finite since addresses and values 
are finite. 

For the machine transitions that dispatch on the control stack, only the top- 
most element is used and stack operations always either push or pop a single 
context frame on at a time, i.e., the machine obeys a stack discipline. The 
stack alphabet consists of single evaluation contexts, which include a number of 
expressions, a value, or an environment, all of which are finite sets. Thus the 
stack alphabet is finite. Consequently the machine is a pushdown automaton 
and decidability follows from known results on pushdown automata. 



4.2 Instantiations 

We have now described the design of a sound and decidable framework for the 
pushdown analysis of JavaScript. The framework has a single point of control for 
governing the precision of an analysis, namely the alloc function. The restrictions 
on acceptable alloc functions are fairly liberal: any allocation policy is sound, 
and so long as the policy draws from a finite set of addresses for a given program, 
the analysis will be decidable. 

At its simplest, the alloc function could produce a constant address: 

alloc{<i) = a. 

A more refined analysis can be obtained by a more refined allocation policy. 
OCFA for example, distinguishes bindings of differently named variables, but 



merges all bindings for a given variable name. The allocation function corre- 
sponding to this strategy is: 

alloc{{a, let (x = v) c, E) ap) — x 
alloc{{a, (fun(x) { e } , p) (v) , E) ap) = x 
alloc{{a, throw v,try {•} catch (x){c> :: E)ap) — x 

This strategy uses variables names as addresses and always allocates the variable 
names being bound. The strategy is finite for a given program and produces a 
pushdown generalization of classical OCFA. Moreover, the state-space simplifies 
greatly since it can be observed that under this strategy, every environment is 
the identity function on variable names. Thus environments could be eliminated 
from the semantics. 

There is still the need to designin a heap abstraction, i.e., what should the 
allocation function produce for: 

alloc{{a, ref v, E)ap)^ 

Shivers' original formulation of OCFA had a very simple heap abstraction cor- 
responding to the constant allocation function above ^36j. More refined heap 
abstractions are obtained by simply designing better strategies for this case of 
alloc. 

The fc-CFA hierarchy, of which OCFA is the base, refines the above allocation 
policy by pairing variables together with bounded history of the calling context 
at the binding site of a variable. Such an abstraction is easily expressible in our 
framework as follows: 

alloc{{a,\et (x = v) c,E)ap) 
alloc{{a, (fun(x) {. e } , p) (v) , E) ap) 
alloc {{<T, throw v,try {•} catch (a;){c} :: E)ap) 

where 

[nilj k = nil 

L£^Jo = nil 
l£::E\k+i=£:: lE\k. 

This strategy uses variable names paired together with a fixed depth view of 
the control stack to allocate bindings. It is easy to vary this strategy for various 
/c-CFA like abstraction, e.g., taking the top k different stack frames, or taking 
the top k application frames by filtering out non-application frames. 

By giving alternative definitions of alloc it is straightforward to design push- 
down versions of other known analyses such as CPA [Tj, sub-OCFA [2], and 
m-CFA m. 



= (S, [E\k) 



4.3 Implementation 



To empirically substantiate our formal claims, we have developed executable 
models and test beds. We have developed the reduction semantics of \p,js in 
Standard ML (SML) and carried out the refocusing construction and subsequent 
program transformations in a step-by-step manner closely following the lecture 
notes of Danvy [12J . We found using SML as a metalanguage helpful due to its 
type system and non-exhaustive and redundant pattern matching warnings. For 
example, we were able to encode Guha et al.'s soundness theorem, which is false 
without the modification to the semantics as described in section [^751 in SML 
in such a way that the type of the one-step reduction relation, coupled with 
exhaustive pattern matching, implies a program is either a value or can make 
progress. 

We ported our semantics and concrete machines to PLT Redex [TS] and then 
built their abstractions. This was done because PLT Redex supports program- 
ming with relations and includes a property-based random testing mechanism. 
The support for programming with relations is an important aspect for building 
the non-deterministic transition systems of the abstracted JAM machines since, 
unlike their concrete counterparts, the transition system cannot be encoded as a 
function in a straightforward way. Using the random testing framework [53] , we 
tested the correspondence, correctness, and soundness theorems. As an added 
benefit, we were able to visualize our test programs' state-spaces using the in- 
cluded graphical tools. 

Finally, we used Guha et aZ.'s code for desugaring in order to test our frame- 
work on real JavaScript code. We tested against the same test bed as Guha et 
ai: a significant portion of the Mozilla JavaScript test suite; about 5,000 lines of 
unmodified code. We tested the closure-based semantics of Xpjs for correspon- 
dence against the substitution-based semantics of Xjs and tested the machines 
for correctness with respect to the Xpjs semantics. Finally, we tested the instan- 
tiations of our analytic framework for soundness with respect to the machines. 
Since the semantics of Ajs have been validated against the output of Rhino, V8, 
and Spider Monkey, and all of semantic artifacts simulate or approximate Xjs, 
these tests substantiate our framework's correctness. 

5 Related work 

Our approach fits cleanly within the progression of work in abstract interpre- 
tation |9ll0j and is inspired by the pioneering work on higher-order program 
analysis by Jones [22]. Like Jones, our work centers around machine-based ab- 
stractions of higher-order languages; and like Jones [3S], we have obtained our 
machines by program transformations of high-level semantic descriptions in the 
tradition of Reynolds [33J • We have been able to leverage the refocusing approach 
of Danvy, et ai, to systematically derive such machines |13l4ll2j . and our main 
technical insight has been that threading bindings — but not continuations — 
through the store results in straightforward and clearly sound framework that 



precisely reasons about control flow in face of higher-order functions and sophis- 
ticated control operators. 

5.1 Pushdown analyses 

The most closely related work to ours is Vardoulakis and Shivers recent work 
on CFA2 |i40j . CFA2 is a table-driven summarization algorithm that exploits the 
balanced nature of calls and returns to improve return- flow precision in a control- 
flow analysis for CPS programs. Though CFA2 alludes to exploiting context-free 
languages, context-free languages are not explicit in its formulation in the same 
way that pushdown systems are in pushdown control-flow analysis [14]. With 
respect to CFA2, the pushdown analysis presented here is potentially polyvariant 
and in direct-style. 

On the other hand, CFA2 distinguishes stack-allocated and store-allocated 
variable bindings, whereas our formulation of pushdown control-flow analysis 
does not and allocates all bindings in the store. If CFA2 determines a binding 
can be allocated on the stack, that binding will enjoy added precision during the 
analysis and is not subject to merging like store-allocated bindings. 

Recently, Vardoulakis and Shivers have extended CFA2 to analyze programs 
containing the control operator call-with-current-continuation |41| . The 
operator, abbreviated call/cc, works as follows: at the point it is applied, it 
reifies the continuation as procedure; when that procedure is applied it aborts 
the call's continuation and installs the reified continuation. CFA2 is able to 
analyze this powerful control operator, which is able to encode all of the control 
operators considered here, but without the same guarantees of precision as this 
work is able to provide for the weaker notion of exceptions and breaks. So while 
CFA2 can analyze call/cc, it does so with the potential for loss of precision 
about the control stack; indeed, this appears to be inherently necessary for any 
computable analysis of call/cc as the operator does not obey a stack discipline. 
Vardoulakis has implemented CFA2 for JavaScript as the "Doctor JS" toollf] 

The current work also draws on CFL- and pushdown-reachability analy- 
sis [512413115^ . CFL-reachability techniques have also been used to compute 
classical finite-state abstraction CFAs [531 and type-based polymorphic control- 
flow analysis |30j . These analyses should not he confused with pushdown control- 
flow analysis: our results demonstrate how to compute a fundamentally more 
precise kind of CFA, while the work on CFL-reachability has shown how to cast 
classical analyses, such as OCFA, as a reachability problem for a context-free 
language. 

5.2 JavaScript analyses 

Thiemann |37| develops a type system for Core JavaScript, a restricted subset 
of JavaScript. The type system rules out the application of non-functions, ap- 
plying primitive operations to values of incorrect base type, dereferencing fields 



http : / / doctor j s . org/ 



of the undefined or null value, and referring to unbound variables. Jensen, 
M0ller, and Thiemann, |21| develop an abstract interpretation computing type 
inference. It builds on the type system of Thiemann [37\ using it as inspira- 
tion for their abstract domains. Richards et a/.'s landmark empirical survey of 
JavaScript code [31] made it clear that for JavaScript analyses to work in the 
wild, it is not sufficient to handle only a well-behaved core of the language. 
Capturing ill-behaved parts of JavaScript soundly and precisely was a major 
motivation for our research. 

Subsequently, Heidegger and Thiemann have extended the type system with 
a notion of recency to improve precision [18J and Jensen et al., have developed 
a technique of lazy propagation to increase the feasibility of the analysis |20j . 
Balakrishnan and Reps pioneered the idea of recency [3J: objects are considered 
recent when created and given a singleton type and treated flow-sensitively until 
"demoted" to a summary type that is treated flow-insensitively. Recency enables 
strong update in analyses [19J, which is important for reasoning precisely about 
initialization patterns in JavaScript programs. Recency and lazy propagation 
are orthogonal to our analytic framework: in our recent work, we show how to 
incorporate a generalization of recency into a machine-based static analysis |26) 
through the concept of anodization. 

Guha, Krishnamurthi and Jim |16j developed an analysis for intrusion-detection 
of JavaScript, driven in part by an adaptation of fc-CFA to a large subset of 
JavaScript. Our work differs from their work in that we are formally guaran- 
teeing soundness with respect to a concrete semantics, we provide fine-grained 
control over precision for our finite-state analysis and we also provide a push- 
down analysis for handling the complex non-local control features which pervade 
JavaScript code. (Guha et al. make a best-effort attempt at soundness and dy- 
namically detect violations of soundness in empirical trials, violations which they 
use to refine their analysis.) 

Chugh et al. [6J present a staged information-flow analysis of JavaScript. 
In effect, their algorithm partially evaluates the analysis with respect to the 
available JavaScript to produce a residual analysis. When more code becomes 
available, the residual analysis resumes. Our own framework is directly amenable 
to such partial evaluation for handling constructs like eval: explore the state- 
space aggressively, but do not explore past eval states. The resulting partial 
abstract transition graph is sound until the program encounters eval. At this 
point, the analysis may be resumed with the code supplied to eval. 

6 Conclusions and perspective 

We present a principled systematic derivation of machine-based analysis for 
JavaScript. By starting with an established formal semantics and transform- 
ing it into an abstract machine, we soundly capture JavaScript in full, quirks 
and all. The abstraction of this machine yields a robust finite-state framework 
for the static analysis of JavaScript, capable of instantiating the equivalent of 
traditional techniques such as /c-CFA and CPA. Finding the traditional finite- 



state approach wanting in precision for JavaScript's extensive use of non-local 
control, wc extend the theory of systematic abstraction of abstract machines 
from finite-state to pushdown. These decidable pushdown machines precisely 
model the structure of the program stack, and so do not lose precision in the 
presence of control constructs that depend on it, such as recursion or complex 
exceptional control-flow. 

https : //github . com/dvanhorn/ j am/ 
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